Life Hacks:Phone/Email Extractor

You are working for a company and you are asked to find the Phone Number and Email address from a long Excel file. You start scrolling the document which goes on endlessly. You know you will have to do lots of overtime to finish the work. You look at the phone to see your son’s photo. You won’t be spending family time due to the Workload. But, Suddenly you have an idea.

Python makes it really easy to do this task using Regular Expressions. We are going to try to use Regular Expression.
Note: Reproducibility is the most important thing kept in mind while writing the program.

We import the following packages

  1. re - Regular Expression Package
  2. pyperclip - It lets us work with clipboard

Steps:

  1. Write Regular Expression syntax
  2. Paste the text using pyperclip
  3. Print the matched result

Regular Expression are very useful but they are not without their own downsides. It is important to set the syntax of ReX(Regular Expression) right or the whole thing would go wrong.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\)) #Area Code
(\s|-|\.) #Separator
(\d{4}|\d{3}) #middle 3 or 4 digits
(\s|-|\.) #Separator
(\d{4}) #Last 4 digits
)''',re.X)



emailRegex=re.compile(r'''(
[a-zA-Z0-9.+_%-]+ #Username
@ #@symbol
[a-zA-Z0-9.-]+ #domain name
(\.[a-zA-Z]{2,4}) #dot something(most likely com)
)''',re.X)

Next step can be modified in several ways. The most general way would be to copy the text and use pyperclip to access the clipboard.

Final step would be to print the result in desired way.

Thoughts : I definitely think it would be useful to me. I can modify it and make my web crawler. i can think of many scenario where it would be useful.

The whole program is written below for reference.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
## Phone number and email extractor

import pyperclip,re

# Pyperclip is used to paste the contents in the clipboard
# print(pyperclip.paste())

phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\)) #Area Code
(\s|-|\.) #Separator
(\d{4}|\d{3}) #middle 3 or 4 digits
(\s|-|\.) #Separator
(\d{4}) #Last 4 digits
)''',re.X)


# print(phoneRegex.findall("easfgd 435-324-4323 anf 535-2452-3424 "))

emailRegex=re.compile(r'''(
[a-zA-Z0-9.+_%-]+ #Username
@ #@symbol
[a-zA-Z0-9.-]+ #domain name
(\.[a-zA-Z]{2,4}) #dot something(most likely com)
)''',re.X)

# print(emailRegex.findall("email is fdsdgr@gmai.com"))

text=str(pyperclip.paste())
matches=[]
for groups in phoneRegex.findall(text):
phoneNum='-'.join([groups[1],groups[3],groups[5]])
matches.append('Phone numbers')
matches.append(phoneNum)

for groups in emailRegex.findall(text):
matches.append("Emails")
matches.append(groups[0])

for match in matches:
print(match)